This lab is adapted from “Human Genome Analysis Lab 9 : Working with COVID-19 reporting data by Jeffrey Blanchard”
-Access COVID-19 data remotely -Understand report -Create graphs and maps to visualize COVID-19 data
Para instalar las librerias usted debe copiad: install.packages(“package_name”), una vez instaladas usted podra cargarlas a su ambiente en R
Recientemente se ha establecido el nombre SARS-Cov-2 (severe acute respiratory syndrome coronavirus 2) para el virus basado en diversos analusis filogeneticos. La enfermedad causada por el virus es conocida como COVID-19. En este laboratorio se trabajaran con los casos diaramente actualizados de COVID-19 (confirmados, muertes y recuperaciones) de la base de datos de John Hopkins University. Los investigadores Ensheng Dong, Hongru Du, Lauren Gardner desarrollaron una plataforma para monitorear los reportes de COVID-19 en tiempo real.
Los datos colectados pertenecen a las siguientes instituciones: World Health Organization (WHO) | DXY.cn. Pneumonia. 2020 | BNO News | National Health Commission of the People’s Republic of China (NHC) | China CDC (CCDC) | Hong Kong Department of Health | Macau Government | Taiwan CDC | US CDC | Government of Canada | Australia Government Department of Health | European Centre for Disease Prevention and Control (ECDC) | Ministry of Health Singapore (MOH) | Italy Ministry of Health | 1Point3Arces | WorldoMeters
Es importante entender que los datos reportados estan supeditados a los test para COVID-19, por ende, en muchos paises se pueden estar ignorando varios casos dado a la falta de test para el virus. Aun asi, los casos diarios siguen creciendo de manera exponencial incrementando la curva de infeccion. Por lo cual la idea del laboratorio es que compartan esta informacion con amigos y familiares y se pueda entender la importancia del aislamiento preventivo al igual que las precauciones para evitar contagios y reducir la curva de infeccion. #SocialDistancing #COVID-19 #CuarentenaPreventiva
csse_covid_19_data | csse_covid_19_daily_reports | 03-24-2020.csv Los archivos contienen varias columnas con la informacion de: Province/State Country/Region Last Update Confirmed Deaths Recovered Latitude Longitude
csse_covid_19_data | csse_covid_19_time_series | time_series_covid19_confirmed_global.csv Los archivos contienen columnas con la sguiente informacion: Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20…
A continuacion seusaran las librerias tidyverse y lubridate para reformar los datos Siga los siguientes pasos 1. Vaya al archivo que quiere descargar 2. En la parte superior derecha haga click en le boton “Raw” 3. Dele guardar como o copie el url
library(tidyverse)
library(maps)
library(mapdata)
library(lubridate)
library(viridis)
library(wesanderson)
Crearemos un obj llamado report_03_24_2020 Reportes Marzo 24/2020
report_03_24_2020 <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-24-2020.csv")) %>%
select(-FIPS, -Admin2)
## Parsed with column specification:
## cols(
## FIPS = col_character(),
## Admin2 = col_character(),
## Province_State = col_character(),
## Country_Region = col_character(),
## Last_Update = col_datetime(format = ""),
## Lat = col_double(),
## Long_ = col_double(),
## Confirmed = col_double(),
## Deaths = col_double(),
## Recovered = col_double(),
## Active = col_double(),
## Combined_Key = col_character()
## )
head(report_03_24_2020)
## # A tibble: 6 x 10
## Province_State Country_Region Last_Update Lat Long_ Confirmed
## <chr> <chr> <dttm> <dbl> <dbl> <dbl>
## 1 South Carolina US 2020-03-24 23:37:31 34.2 -82.5 1
## 2 Louisiana US 2020-03-24 23:37:31 30.3 -92.4 2
## 3 Virginia US 2020-03-24 23:37:31 37.8 -75.6 1
## 4 Idaho US 2020-03-24 23:37:31 43.5 -116. 19
## 5 Iowa US 2020-03-24 23:37:31 41.3 -94.5 1
## 6 Kentucky US 2020-03-24 23:37:31 37.1 -85.3 0
## # … with 4 more variables: Deaths <dbl>, Recovered <dbl>, Active <dbl>,
## # Combined_Key <chr>
str(report_03_24_2020)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 3417 obs. of 10 variables:
## $ Province_State: chr "South Carolina" "Louisiana" "Virginia" "Idaho" ...
## $ Country_Region: chr "US" "US" "US" "US" ...
## $ Last_Update : POSIXct, format: "2020-03-24 23:37:31" "2020-03-24 23:37:31" ...
## $ Lat : num 34.2 30.3 37.8 43.5 41.3 ...
## $ Long_ : num -82.5 -92.4 -75.6 -116.2 -94.5 ...
## $ Confirmed : num 1 2 1 19 1 0 1 0 25 0 ...
## $ Deaths : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Recovered : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Active : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Combined_Key : chr "Abbeville, South Carolina, US" "Acadia, Louisiana, US" "Accomack, Virginia, US" "Ada, Idaho, US" ...
## - attr(*, "spec")=
## .. cols(
## .. FIPS = col_character(),
## .. Admin2 = col_character(),
## .. Province_State = col_character(),
## .. Country_Region = col_character(),
## .. Last_Update = col_datetime(format = ""),
## .. Lat = col_double(),
## .. Long_ = col_double(),
## .. Confirmed = col_double(),
## .. Deaths = col_double(),
## .. Recovered = col_double(),
## .. Active = col_double(),
## .. Combined_Key = col_character()
## .. )
Reportes Marzo 11/2020 En algunos reportes no tan recientes, los formatos son diferentes, en este caso tendremos que cambiar los nombres de las columnas Province/State y Country/Region
report_03_11_2020 <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-11-2020.csv")) %>%
rename(Country_Region = "Country/Region", Province_State = "Province/State")
## Parsed with column specification:
## cols(
## `Province/State` = col_character(),
## `Country/Region` = col_character(),
## `Last Update` = col_datetime(format = ""),
## Confirmed = col_double(),
## Deaths = col_double(),
## Recovered = col_double(),
## Latitude = col_double(),
## Longitude = col_double()
## )
head(report_03_11_2020)
## # A tibble: 6 x 8
## Province_State Country_Region `Last Update` Confirmed Deaths Recovered
## <chr> <chr> <dttm> <dbl> <dbl> <dbl>
## 1 Hubei China 2020-03-11 10:53:02 67773 3046 49134
## 2 <NA> Italy 2020-03-11 21:33:02 12462 827 1045
## 3 <NA> Iran 2020-03-11 18:52:03 9000 354 2959
## 4 <NA> Korea, South 2020-03-11 21:13:18 7755 60 288
## 5 France France 2020-03-11 22:53:03 2281 48 12
## 6 <NA> Spain 2020-03-11 20:53:02 2277 54 183
## # … with 2 more variables: Latitude <dbl>, Longitude <dbl>
str(report_03_11_2020)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 216 obs. of 8 variables:
## $ Province_State: chr "Hubei" NA NA NA ...
## $ Country_Region: chr "China" "Italy" "Iran" "Korea, South" ...
## $ Last Update : POSIXct, format: "2020-03-11 10:53:02" "2020-03-11 21:33:02" ...
## $ Confirmed : num 67773 12462 9000 7755 2281 ...
## $ Deaths : num 3046 827 354 60 48 ...
## $ Recovered : num 49134 1045 2959 288 12 ...
## $ Latitude : num 31 43 32 36 46.2 ...
## $ Longitude : num 112.27 12 53 128 2.21 ...
## - attr(*, "spec")=
## .. cols(
## .. `Province/State` = col_character(),
## .. `Country/Region` = col_character(),
## .. `Last Update` = col_datetime(format = ""),
## .. Confirmed = col_double(),
## .. Deaths = col_double(),
## .. Recovered = col_double(),
## .. Latitude = col_double(),
## .. Longitude = col_double()
## .. )
report_03_11_2020 %>%
filter(Country_Region == "US") %>%
group_by(Province_State) %>%
summarise(Confirmed = sum(Confirmed)) %>%
ggplot(aes(x = Confirmed, y = reorder(Province_State, Confirmed))) +
geom_point() +
ggtitle("Confirmed cases for each US State March 11") +
ylab("Country/Region") +
xlab("Confirmed Cases")
report_03_24_2020 %>%
filter(Country_Region == "US") %>%
group_by(Province_State) %>%
summarise(Confirmed = sum(Confirmed)) %>%
ggplot(aes(x = Confirmed, y = reorder(Province_State, Confirmed))) +
geom_point() +
ggtitle("Confirmed cases for each US State March 24") +
ylab("Country/Region") +
xlab("Confirmed Cases")
report_03_24_2020 %>%
filter(Country_Region == "China") %>%
group_by(Province_State) %>%
summarise(Confirmed = sum(Confirmed)) %>%
ggplot(aes(x = Confirmed, y = reorder(Province_State, Confirmed))) +
geom_point() +
ggtitle("Confirmed cases for each China region March 24") +
ylab("Country/Region") +
xlab("Confirmed Cases")
Tenga en cuenta que paises como China y US tienen multiples observaciones (Estados/Regiones) por ende se puede realizar este tipo de grafica, otros paises como Colombia y España tiene solo un dato por una variable, es decir el numero de casos total por pais.
Si se quiere graficar el numero total de casos por pais entonces tendremos que sumar los valores por variable (fila)
report_03_24_2020 %>%
group_by(Country_Region) %>%
summarise(Deaths = sum(Deaths)) %>%
arrange(desc(Deaths))
## # A tibble: 169 x 2
## Country_Region Deaths
## <chr> <dbl>
## 1 Italy 6820
## 2 China 3281
## 3 Spain 2808
## 4 Iran 1934
## 5 France 1102
## 6 US 706
## 7 United Kingdom 423
## 8 Netherlands 277
## 9 Germany 157
## 10 Belgium 122
## # … with 159 more rows
#Compare los valores entres los reportes del 11 y el 24 de Marzo. Que pais tiene mas muertes reportadas el 11 y que pais tiene mas muertes reportadas el 24?
#Realice la misma tarea pero con los casos confirmados y otro a parte con los casos recuperados
report_03_24_2020 %>%
group_by(Country_Region) %>%
summarise(Deaths = sum(Deaths)) %>%
arrange(desc(Deaths)) %>%
slice(1:20) %>%
ggplot(aes(x = Deaths, y = reorder(Country_Region, Deaths))) +
geom_bar(stat = 'identity') +
ggtitle("The 20 countries with the most reported COV19-related deaths") +
ylab("Country/Region") +
xlab("Deaths")
Realice como minimo dos graficas diferentes a las del ejemplo
time_series_confirmed <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) %>%
rename(Province_State = "Province/State", Country_Region = "Country/Region")
## Parsed with column specification:
## cols(
## .default = col_double(),
## `Province/State` = col_character(),
## `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
Revise la tabla
head(time_series_confirmed)
## # A tibble: 6 x 75
## Province_State Country_Region Lat Long `1/22/20` `1/23/20` `1/24/20`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 <NA> Afghanistan 33 65 0 0 0
## 2 <NA> Albania 41.2 20.2 0 0 0
## 3 <NA> Algeria 28.0 1.66 0 0 0
## 4 <NA> Andorra 42.5 1.52 0 0 0
## 5 <NA> Angola -11.2 17.9 0 0 0
## 6 <NA> Antigua and B… 17.1 -61.8 0 0 0
## # … with 68 more variables: `1/25/20` <dbl>, `1/26/20` <dbl>, `1/27/20` <dbl>,
## # `1/28/20` <dbl>, `1/29/20` <dbl>, `1/30/20` <dbl>, `1/31/20` <dbl>,
## # `2/1/20` <dbl>, `2/2/20` <dbl>, `2/3/20` <dbl>, `2/4/20` <dbl>,
## # `2/5/20` <dbl>, `2/6/20` <dbl>, `2/7/20` <dbl>, `2/8/20` <dbl>,
## # `2/9/20` <dbl>, `2/10/20` <dbl>, `2/11/20` <dbl>, `2/12/20` <dbl>,
## # `2/13/20` <dbl>, `2/14/20` <dbl>, `2/15/20` <dbl>, `2/16/20` <dbl>,
## # `2/17/20` <dbl>, `2/18/20` <dbl>, `2/19/20` <dbl>, `2/20/20` <dbl>,
## # `2/21/20` <dbl>, `2/22/20` <dbl>, `2/23/20` <dbl>, `2/24/20` <dbl>,
## # `2/25/20` <dbl>, `2/26/20` <dbl>, `2/27/20` <dbl>, `2/28/20` <dbl>,
## # `2/29/20` <dbl>, `3/1/20` <dbl>, `3/2/20` <dbl>, `3/3/20` <dbl>,
## # `3/4/20` <dbl>, `3/5/20` <dbl>, `3/6/20` <dbl>, `3/7/20` <dbl>,
## # `3/8/20` <dbl>, `3/9/20` <dbl>, `3/10/20` <dbl>, `3/11/20` <dbl>,
## # `3/12/20` <dbl>, `3/13/20` <dbl>, `3/14/20` <dbl>, `3/15/20` <dbl>,
## # `3/16/20` <dbl>, `3/17/20` <dbl>, `3/18/20` <dbl>, `3/19/20` <dbl>,
## # `3/20/20` <dbl>, `3/21/20` <dbl>, `3/22/20` <dbl>, `3/23/20` <dbl>,
## # `3/24/20` <dbl>, `3/25/20` <dbl>, `3/26/20` <dbl>, `3/27/20` <dbl>,
## # `3/28/20` <dbl>, `3/29/20` <dbl>, `3/30/20` <dbl>, `3/31/20` <dbl>,
## # `4/1/20` <dbl>
Aqui necesitamos trasnformar el data.frame para posteriores graficas. El formato anterior se conoce como ‘wide format’ (muchas columnas) y sera transformado a ‘long format’ (muchas filas)
time_series_confirmed_long <- time_series_confirmed %>%
pivot_longer(-c(Province_State, Country_Region, Lat, Long),
names_to = "Date", values_to = "Confirmed")
library(maps)
library(viridis)
world <- map_data("world")
mybreaks <- c(1, 20, 100, 1000, 50000)
ggplot() +
geom_polygon(data = world, aes(x=long, y = lat, group = group), fill="grey",colour = "black", alpha=0.3) +
geom_point(data=time_series_confirmed, aes(x=Long, y=Lat, size=`3/11/20`, color=`3/11/20`),stroke=F, alpha=0.7) +
scale_size_continuous(name="Cases", trans="log", range=c(1,7),breaks=mybreaks, labels = c("1-19", "20-99", "100-999", "1,000-49,999", "50,000+")) +
# scale_alpha_continuous(name="Cases", trans="log", range=c(0.1, 0.9),breaks=mybreaks) +
scale_color_viridis_c(option="inferno",name="Cases", trans="log",breaks=mybreaks, labels = c("1-19", "20-99", "100-999", "1,000-49,999", "50,000+")) +
theme_void() +
guides( colour = guide_legend()) +
labs(caption = "") +
theme(
legend.position = "bottom",
text = element_text(color = "#22211d"),
plot.background = element_rect(fill = "#ffffff", color = NA),
panel.background = element_rect(fill = "#ffffff", color = NA),
legend.background = element_rect(fill = "#ffffff", color = NA)
)+
ggtitle("Confirmed COVID-19 Cases March 11, 2020")
## Warning: Transformation introduced infinite values in discrete y-axis
## Warning: Transformation introduced infinite values in discrete y-axis
## Warning in sqrt(x): NaNs produced
## Warning: Removed 94 rows containing missing values (geom_point).
daily_report_03_31_2020 <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-31-2020.csv")) %>%
rename(Long = "Long_") %>%
select(-Admin2, -FIPS)
ggplot(daily_report_03_31_2020, aes(x = Long, y = Lat, size = Confirmed/1000)) +
borders("world",fill = "grey90", colour=NA) + #additional option colour = "black" instead of black
theme_bw() +
geom_point(shape = 21, color='blue', fill='blue', alpha = 0.5) +
labs(title = 'World COVID-19 Confirmed cases by March 31, 2020',x = '', y = '',
size="Cases (x1000))") +
theme(legend.position = "right") +
coord_fixed(ratio=1.5)
## Warning: Removed 2 rows containing missing values (geom_point).
mybreaksUS <- c(1, 100, 1000, 10000, 10000)
daily_report_03_31_2020 %>%
filter(Country_Region == "US") %>%
filter (!Province_State %in% c("Alaska","Hawaii", "American Samoa",
"Puerto Rico","Northern Mariana Islands",
"Virgin Islands", "Recovered", "Guam", "Grand Princess",
"District of Columbia", "Diamond Princess")) %>%
filter(Lat > 0) %>%
ggplot(aes(x = Long, y = Lat, size = Confirmed)) +
borders("state", colour = "white", fill = "grey90") +
geom_point(aes(x=Long, y=Lat, size=Confirmed, color=Confirmed),stroke=F, alpha=0.7) +
scale_size_continuous(name="Cases", trans="log", range=c(1,7),
breaks=mybreaks, labels = c("1-99",
"100-999", "1,000-9,999", "10,000-99,999", "50,000+")) +
scale_color_viridis_c(option="viridis",name="Cases",
trans="log", breaks=mybreaks, labels = c("1-99",
"100-999", "1,000-9,999", "10,000-99,999", "50,000+")) +
# Cleaning up the graph
theme_void() +
guides( colour = guide_legend()) +
labs(title = "COVID-19 Confirmed Cases in the US by March 31, 2020") +
theme(
legend.position = "bottom",
text = element_text(color = "#22211d"),
plot.background = element_rect(fill = "#ffffff", color = NA),
panel.background = element_rect(fill = "#ffffff", color = NA),
legend.background = element_rect(fill = "#ffffff", color = NA)
) +
coord_fixed(ratio=1.5)
## Warning: Transformation introduced infinite values in discrete y-axis
## Warning: Transformation introduced infinite values in discrete y-axis
## Warning in sqrt(x): NaNs produced
## Warning: Removed 7 rows containing missing values (geom_point).
Graficar el mapa global de casos confirmados para Marzo 25, 2020. Que diferencia encuentra con los mapas de casos de Marzo?
time_series_confirmed_long2 <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) %>%
rename(Province.State = "Province/State", Country.Region = "Country/Region") %>%
pivot_longer(-c(Province.State, Country.Region, Lat, Long),
names_to = "Date", values_to = "cumulative_cases") %>%
mutate(Date = mdy(Date) - days(1),
Place = paste(Lat,Long,sep="_")) %>%
group_by(Place,Date) %>%
summarise(cumulative_cases = ifelse(sum(cumulative_cases)>0,
sum(cumulative_cases),NA_real_),
Lat = mean(Lat),
Long = mean(Long)) %>%
mutate(Pandemic_day = as.numeric(Date - min(Date)))
## Parsed with column specification:
## cols(
## .default = col_double(),
## `Province/State` = col_character(),
## `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
ggplot(subset(time_series_confirmed_long2, Date %in% seq(min(Date),max(Date),7)),
aes(x = Long, y = Lat, size = cumulative_cases/1000)) +
borders("world", colour = NA, fill = "grey90") +
theme_bw() +
geom_point(shape = 21, color='purple', fill='purple', alpha = 0.5) +
labs(title = 'COVID-19 spread',x = '', y = '',
size="Cases (x1000))") +
theme(legend.position = "right") +
coord_fixed(ratio=1)+
facet_wrap(.~Date,nrow=3)
## Warning: Removed 1415 rows containing missing values (geom_point).
#Latin countries
some.latin <- c("Colombia","Brazil","Peru","Ecuador","Chile","Venezuela","Bolivia","Argentina","Paraguay","Uruguay")
#retrieve map data
some.latin <- map_data("world", region = some.latin)
#Coordinates of countries
region.coord.data <- some.latin %>%
group_by(region) %>%
summarise(long = mean(long), lat = mean(lat))
#Coordinates of countries
region.coord.data <- some.latin %>%
group_by(region) %>%
summarise(long = mean(long), lat = mean(lat))
#Confirmed cases of COVID-19 in Latin America
time_series_confirmed_latin <- time_series_confirmed %>%
filter (Country_Region %in% c("Colombia","Brazil","Peru","Ecuador","Chile","Venezuela","Bolivia","Argentina","Paraguay","Uruguay"))
#You can plot data of recoveries, deaths and confirmed cases separately
mybreaks3 <- c(1, 50, 100, 200, 300, 1000, 2000)
ggplot() +
geom_polygon(data = some.latin, aes(x=long, y = lat, group = group), fill="grey", alpha=0.3,colour='black') + #additional option colour="white"
geom_point(data=time_series_confirmed_latin, aes(x=Long, y=Lat, size=`3/24/20`, color=`3/24/20`),stroke=F, alpha=0.7)+
scale_size_continuous(name="Cases", trans="log", range=c(1,20),breaks=mybreaks3, labels = c("1-49", "50-99", "100-199", "200-299", "300-399", "1000-1999",'2000+')) +
# scale_alpha_continuous(name="Cases", trans="log", range=c(0.1, 0.9),breaks=mybreaks) +
scale_color_viridis_c(option="inferno",name="Cases", trans="log",breaks=mybreaks3, labels = c("1-49","50-99", "100-199", "200-299", "300-399", "1000-1999",'2000+')) +
theme_void() +
guides( colour = guide_legend()) +
labs(caption = "") +
theme(
legend.position = "bottom",
legend.text = element_text(size = 20),
legend.title = element_text(size = 20),
text = element_text(color = "#22211d"),
plot.background = element_rect(fill = "#ffffff", color = NA),
panel.background = element_rect(fill = "#ffffff", color = NA),
legend.background = element_rect(fill = "#ffffff", color = NA))+
geom_text(aes(x=long, y=lat,label=region), data = region.coord.data, size =5, hjust=0.5)+
ggtitle("COVID-19 Confirmed Cases in South America reported on March 24/20")
library(ggplot2)
library(gganimate)
library(transformr)
theme_set(theme_bw())
#install.packages("sf",dependencies = T)
#install.packages("devtools")
#devtools::install_github("thomasp85/transformr")
time_series_confirmed_long <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) %>%
rename(Province_State = "Province/State", Country_Region = "Country/Region") %>%
pivot_longer(-c(Province_State, Country_Region, Lat, Long),
names_to = "Date", values_to = "Confirmed")
# Let's get the times series data for deaths
time_series_deaths_long <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv")) %>%
rename(Province_State = "Province/State", Country_Region = "Country/Region") %>%
pivot_longer(-c(Province_State, Country_Region, Lat, Long),
names_to = "Date", values_to = "Deaths")
time_series_recovered_long <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv")) %>%
rename(Province_State = "Province/State", Country_Region = "Country/Region") %>%
pivot_longer(-c(Province_State, Country_Region, Lat, Long),
names_to = "Date", values_to = "Recovered")
# Create Keys
time_series_confirmed_long <- time_series_confirmed_long %>%
unite(Key, Province_State, Country_Region, Date, sep = ".", remove = FALSE)
time_series_deaths_long <- time_series_deaths_long %>%
unite(Key, Province_State, Country_Region, Date, sep = ".") %>%
select(Key, Deaths)
time_series_recovered_long <- time_series_recovered_long %>%
unite(Key, Province_State, Country_Region, Date, sep = ".") %>%
select(Key, Recovered)
# Join tables
time_series_long_joined <- full_join(time_series_confirmed_long,
time_series_deaths_long, by = c("Key"))
time_series_long_joined <- full_join(time_series_long_joined,
time_series_recovered_long, by = c("Key")) %>%
select(-Key)
# Reformat the data
time_series_long_joined$Date <- mdy(time_series_long_joined$Date)
# Create Report table with counts
time_series_long_joined_counts <- time_series_long_joined %>%
pivot_longer(-c(Province_State, Country_Region, Lat, Long, Date),
names_to = "Report_Type", values_to = "Counts")
data_time <- time_series_long_joined %>%
group_by(Country_Region,Date) %>%
summarise_at(c("Confirmed", "Deaths", "Recovered"), sum) %>%
filter (Country_Region %in% c("China","Korea, South","Japan","US", "Spain", "Italy"))
p <- ggplot(data_time, aes(x = Date, y = Confirmed, color = Country_Region)) +
geom_point() +
geom_line() +
ggtitle("Confirmed COVID-19 Cases") +
geom_point(aes(group = seq_along(Date)))
p
data_time <- time_series_long_joined %>%
group_by(Country_Region,Date) %>%
summarise_at(c("Confirmed", "Deaths", "Recovered"), sum) %>%
filter (Country_Region %in% c("Colombia","Brazil","Chile", "Venezuela","Ecuador", "Mexico"))
p <- ggplot(data_time, aes(x = Date, y = Confirmed, color = Country_Region)) +
geom_point() +
geom_line() +
ggtitle("Confirmed COVID-19 Cases") +
geom_point(aes(group = seq_along(Date)))
p
animation
data_time <- time_series_long_joined %>%
group_by(Country_Region,Date) %>%
summarise_at(c("Confirmed", "Deaths", "Recovered"), sum) %>%
filter (Country_Region %in% c("China","Korea, South","Japan","US"))
p <- ggplot(data_time, aes(x = Date, y = Confirmed, color = Country_Region)) +
geom_point() +
geom_line() +
ggtitle("Confirmed COVID-19 Cases") +
geom_point(aes(group = seq_along(Date))) +
transition_reveal(Date)
animate(p, end_pause = 15)
## Warning: No renderer available. Please install the gifski, av, or magick package
## to create animated output
covid <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) %>%
rename(Province_State= "Province/State", Country_Region = "Country/Region") %>%
pivot_longer(-c(Province_State, Country_Region, Lat, Long),
names_to = "Date", values_to = "Confirmed") %>%
mutate(Date = mdy(Date) - days(1),
Place = paste(Lat,Long,sep="_")) %>%
# Summarizes state and province information
group_by(Place,Date) %>%
summarise(cumulative_cases = ifelse(sum(Confirmed)>0,
sum(Confirmed),NA_real_),
Lat = mean(Lat),
Long = mean(Long)) %>%
mutate(Pandemic_day = as.numeric(Date - min(Date)))
## Parsed with column specification:
## cols(
## .default = col_double(),
## `Province/State` = col_character(),
## `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
world <- ggplot(covid,aes(x = Long, y = Lat, size = cumulative_cases/1000)) +
borders("world", colour = "gray50", fill = "grey90") +
theme_bw() +
geom_point(color='purple', alpha = .5) +
labs(title = 'Pandemic Day: {frame}',x = '', y = '',
size="Cases (x1000))") +
theme(legend.position = "right") +
coord_fixed(ratio=1.3)+
transition_time(Date) +
enter_fade()
animate(world, end_pause = 30)
## Warning: No renderer available. Please install the gifski, av, or magick package
## to create animated output